-
Notifications
You must be signed in to change notification settings - Fork 917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Fix slow decompression performance #7951
Conversation
Can you also post the results with CUDA 10.2? The nanosleep issue is in driver versions 450.80.02 to 450.110 but only visible in CUDA 10.2 because CUDA 11 nvcc patches it. |
I tested this against the files that were causing issues on by dev box that caused me to file #7114. This fixes them. The files that took 1.8 seconds before now finish in 90 ms, and a complete build of the spark plugin with tests went from 38 mins back to 30. |
Wait, does this still target branch 0.19? Should branch 0.19 be frozen? |
It's a hotfix. |
We're actually going to target this to branch-0.18 to hotfix 0.18 for this and then make sure it gets forward applied to 0.19 and 0.20. |
Alright, tried targeting |
@elstehle Can you please post the performance after your current commit where you have removed nanosleep. |
@revans2 if you could test again with the latest changes it would appreciated as well! |
I did a quick test with it and it looks good. I don't have the exact numbers right now, but I will try to get them |
This time tested with CUDA 10.2 (driver : (T4, CUDA 10.2, driver 450.102.04)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elstehle should probably emphasise that the performance actually improved for non affected driver versions as well. 16 ms -> 13 ms
Yup the numbers are about the same. about 93 ms for reading the file that used to take 1.8 seconds and 31 mins to run the full set of tests instead of 38. Don't know if it is run to run variance or not, but either way it is a lot better. |
See #7951 Authors: - Elias Stehle (https://github.com/elstehle)
Closing this in favor of #7975 which merges the commits from branch-0.18 into branch-0.19. |
Fixes #7114
No performance regression on Turing (T4, CUDA 11.0, driver 450.102.04):
Click here to see environment details